home *** CD-ROM | disk | FTP | other *** search
- BUSINESS, Page 58Ghost in the Machine
-
-
- The nine-hour breakdown of AT&T's long-distance telephone
- network dramatizes the vulnerability of complex computer systems
- everywhere
-
- By PHILIP ELMER-DEWITT -- Reported by Thomas McCarroll/New York
- and Paul A. Witteman/San Francisco
-
-
- The first sign that something had gone haywire in AT&T's
- long-distance telephone network came at 2:25 p.m. last Monday,
- when the giant map of the U.S. in the company's operations
- center in New Jersey began to light up like a football
- scoreboard. For reasons still being investigated, a computer in
- New York City had come to believe it was overloaded with calls,
- and it started to reject them. Alerted to New York's troubles,
- dozens of backup computers across the U.S. automatically
- switched in to take up the slack -- only to exhibit the same
- bizarre symptoms. People trying to place long-distance calls all
- over the world suddenly began to hear busy signals and recorded
- messages blandly informing them that "all circuits" were busy.
-
- Thus began the worst computer breakdown in the history of
- the U.S. telephone system. The incident was also a vivid
- reminder of how susceptible America, and the world, has become
- to computer failures -- natural and man-made. In 20 years of
- intensive automation, everything from supermarkets to stock
- exchanges has been computerized. Last week businesses and
- consumers were forced to face up to a downside of technology
- that becomes apparent only when the new systems fail. Said
- Steven Idelman, chairman of Omaha-based Idelman Telemarketing:
- "When things go wrong in a computer environment, they go wrong
- in a big way."
-
- Things stayed wrong at AT&T for nine hours last week. Of the
- 148 million long-distance and 800-number calls placed with the
- company that day, only 50% got through. Hotels lost bookings.
- Cars went unrented. The number of calls to the American Airlines
- reservation system fell two-thirds. Idelman had to send 800
- phone workers home for the day; he estimates he lost about
- $75,000 in sales. All told, the breakdown cost AT&T some $60
- million to $75 million in lost revenues. Said AT&T Chairman
- Robert Allen: "It was the worst nightmare I've had in 32 years
- in the business."
-
- Phone-company technicians traced the problem to a single
- "failure of logic" in the computer programs that route calls
- through the AT&T network. Like many programming bugs, it stemmed
- from an improvement on the original system. By carrying
- information about who is calling whom on a separate channel, or
- band, from the call signal itself, AT&T has been able to reduce
- the time between dialing and ringing from as much as 20 seconds
- to as little as four seconds. But the refinement inadvertently
- made the system more prone to breakdowns. Last week's glitch
- spread rapidly among the 114 computers in AT&T's network in part
- because they all contained the same programming error.
-
- The collapse of its network came at a time of increased
- vulnerability for AT&T. Although Ma Bell still carries 70% of
- the U.S.'s long-distance traffic (down from 90% five years ago),
- it has been fighting a rearguard action to keep its customers
- from defecting to its feisty competitors, MCI and US Sprint. The
- glitch simultaneously deflated AT&T's multimillion-dollar
- "reliability" advertising campaign and handed its competitors
- a once-in-a-career sales pitch. "An important message to
- everyone whose telephone is the lifeline of their business,"
- began a print ad rushed out by US Sprint after the breakdown.
- "Always have two lifelines."
-
- AT&T operators made matters worse on Monday by refusing to
- give stranded customers instructions for calling via MCI or
- Sprint -- a standing order that was reversed 3 1/2 hours after
- the breakdown began, too late to do East Coast businesses any
- good. To help make amends, AT&T announced late last week that
- it had asked the Federal Communications Commission for
- permission to offer long-distance discounts to all callers on
- Valentine's Day. But the phone company's aura of infallibility
- will not be so easily repaired.
-
- That an operation as heavily computerized as AT&T's could
- have maintained such a reputation is a near miracle. To experts
- who track technological mishaps, the past decade reads like an
- unending parade of computer disasters, ranging from the
- humiliating bugs that delayed one space-shuttle launch after
- another to the Belgian stock-exchange computers that collapsed
- under the rush of sell orders during last October's minicrash.
- Computerized elevator doors have shut unexpectedly. Factory
- robots have started without warning, killing workers. A
- misprogrammed medical X-ray machine delivered fatal doses of
- radiation to at least three cancer patients.
-
- The vulnerability of all computer systems was underscored
- last week by separate court proceedings in California and New
- York. In San Jose three Silicon Valley workers were indicted for
- a range of computer crimes, including, for perhaps the first
- time, taking classified military information from Government
- computers. The next day a Cornell University graduate student
- made the first public explanation of how the rogue program he
- released into a research network in November 1988 managed to
- cripple some 6,000 university and military computers. "It was
- a mistake," Robert Morris said at his federal trial in
- Syracuse. "I'm sorry."
-
- But a trespassing hacker is just one of the problems that
- can bring a computer system to its knees. Technicians were
- installing extra disk drives in an underground computer in Tulsa
- last May when they triggered a collapse of American Airlines'
- SABRE reservation system. Last September a Parisian computer
- creatively misread magnetic labels on 41,000 traffic-violation
- files and began charging delinquent motorists with crimes
- ranging from murder and drug trafficking to prostitution. A fire
- in a Tokyo utility tunnel several years ago wiped out circuits
- connecting Mitsubishi Bank's mainframe computers with branch
- offices, shutting down automated-teller machines across the
- country for five days.
-
- Massive system failures dramatize the trade-off that occurs
- whenever a high-tech system replaces a low-tech one. Because
- most electronic systems are thoroughly interconnected, their
- failures tend to be all-or-nothing affairs. They do not, as
- computer scientists put it, degrade gracefully; they crash.
- Moreover, what is gained in speed and productivity is often lost
- in control, reliability and -- for lack of a better word --
- transparency. When a system of gears and levers stops working,
- its operators can roll up their sleeves, raise the hood and go
- to work. When a microchip goes bad, its circuits are unlikely
- to respond to on-the-spot ministrations.
-
- The risk for businesses is not so much that their systems
- will someday break down -- that is almost a given -- but that
- lingering computer anxiety in the buying public will make it
- harder for firms to recoup their investments in high-tech
- equipment and services. Banks and brokerage houses live in fear
- that one or two well-publicized computer failures will alienate
- their customer base, triggering mass defections to their
- competitors.
-
- There are ways to make the technology more reliable.
- Fault-tolerant computers like those built by Stratus, Tandem
- and, for that matter, AT&T reduce runaway system errors by a
- kind of "paranoid democracy," where modules working in parallel
- constantly evaluate whether their electronic co-workers are
- "sane" or "crazy." Unfortunately, as last week's breakdown
- showed, it is possible for all the modules to go crazy at once.
- Software, always the skittish part of any system, can also be
- made more dependable by imposing the kind of discipline on
- programmers that engineering standards impose on, say, bridge
- designers. A program like AT&T's faulty switching system,
- however, which can contain a million lines of code, is more
- complex than any bridge. "Standards have not been developed,"
- says Donn Parker, a senior management consultant at SRI
- International. "Software is not predictable."
-
- But automation is certain to become ever more pervasive. If
- U.S. firms do not develop the most advanced systems, Japanese
- or South Korean or European companies are sure to do so.
- "American industry faces an extremely competitive situation,"
- says Tandem President James Treybig. "AT&T is fighting to be in
- the forefront of technology, and there is some cost to staying
- in front." If Treybig is right, temporary setbacks are just the
- price of progress. But incidents like last week's are sure to
- influence the priorities of technology shoppers: reliability
- will be just as important as clever ads and fancy features.
-
-
- _________________________________________________________________
- BUGS, GLITCHES AND SNAFUS
-
-
- -- During a payday rush last year, a faulty program shut
- down 1,800 automated-teller machines at Tokyo's Dai-Ichi Kangyo
- Bank.
-
- -- When an airline's reservation system went down last
- year, 14,000 travel agents had to book flights manually.
-
- -- MasterCard processes more than 200,000 credit approvals
- on a typical day, so the phone breakdown wreaked havoc.
-
- -- The Dallas/Fort Worth air-traffic system began spitting
- out gibberish last fall and controllers had to track planes on
- paper.
-
- -- In the early 1980s, Buick had to give 80,000 V6 cars a
- chip transplant to fix flaws in their microprocessors.
-
- -- The U.S.S. Vincennes downed an Iranian airliner when the
- ship's crew misread a computer display.
-
-
-